Conjugate gradient heatbath for ill-conditioned actions 
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We present a method for performing sampling from a Boltzmann distribution of an ill-conditioned 
quadratic action. This method is based on heatbath thermalization along a set of conjugate direc- 
tions, generated via a conjugate-gradient procedure. The resulting scheme outperforms local updates 
for matrices with very high condition number, since it avoids the slowing down of modes with lower 
eigenvalue, and has some advantages over the global heatbath approach, compared to which it is 
more stable and allows for more freedom in devising case-specific optimizations. 
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A common problem in many branches of statistical 
physics is the sampling of distributions of the type p oc 
exp (— ix^x) where A is a positive definite N x N ma- 
trix and the random variable x an TV-dimensional vector. 
Areas in which such sampling is needed are for instance 
QCD 0,0,0] and a recently developed linear scaling elec- 
tronic structure method [J, Q. In principle sampling p is 
straightforward, if diagonalizing A is an option. How- 
ever, in many cases, N is so large that circumventing the 
O (A' 3 ) diagonalization step becomes mandatory. Dif- 
ferent approaches have been proposed. In the so-called 
global heatbath method one writes A = M T M , and 
obtains a series of statistically independent vectors by 
solving the linear system J^lx = R-j where R is a vector 
whose components are distributed according to a Gaus- 
sian with zero mean and unit variance (-R 2 ) = 1. The ad- 
vantage of this method is that the algorithmic complexity 
of the problem can be reduced by using an iterative solver 
for the linear system. In order to expedite sampling a 
Metropolis-like criterion has been suggested that leads 
to correct sampling without having to bring the iterative 
process to full convergence [fl 0|- Unfortunately, when 
the ratio between the largest and smallest eigenvalues 
is large (ill-conditioned matrices) the acceptance of this 
scheme drops to zero unless full convergency is achieved. 
An alternative approach is the local heatbath algorithm, 
in which at every step one single component of the state 
vector x is thermalized in turn, keeping the others fixed. 
It has been pointed out elsewhere [a, [9J that there is a 
close analogy between this second method and the Gauss- 
Seidel minimization technique. This approach is rela- 
tively inexpensive, but becomes very inefficient when the 
condition number of A is large, and even more inefficient 
when the observable of interest depends strongly on the 
eigenvectors corresponding to smaller eigenvalues. 

In this paper we propose a heatbath algorithm in which 
moves are performed along mutually conjugated direc- 
tions. This choice is based on the analogy between vari- 
ous heatbath methods (see e.g. Ref. [f|) and directional 
minimization techniques. We show both analytically and 



numerically that the choice of conjugate directions allows 
all the degrees of freedom to become decorrelated on the 
same time scale, independent of their associated eigen- 
value. We also discuss the cases in which the improved 
efficiency outbalances the additional computational cost. 
Our method can be interpreted as the subdivision of the 
global heatbath matrix inversion process into N interme- 
diate steps, all of which guarantee an exact sampling of 
the probability distribution. 

In section U we introduce a simple formalism to treat 
heatbath moves along general directions, discuss the 
properties of a sweep through a set of conjugate direc- 
tions, and describe a couple of algorithms to obtain such 
a set with reasonable effort. In section [TT| we present 
some numerical tests on a model action and compare 
the efficiency of conjugate directions heatbath with local 
moves for a model observable. In section [nil we compare 
our method with global heatbath, and in section IIVI we 
present our conclusions. 



I. COLLECTIVE MODES HEATBATH 



Given a probability distribution 



P(x) oc exp 



(1) 



a generic heatbath algorithm can be described as a 
stochastic process in which the vector x (t + 1) is related 
to the vector at the previous step x (t) by 



x(i + 1) =x(i) + Td, 
where d is a direction in the x space and 



d(Ax b) 
dAd 



[fid A d)~ 1/2 R 



(2) 



(3) 
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where R is a Gaussian random number with zero mean 
and unitary spread (R 2 ) = 1, and (3 is the inverse tem- 
perature at which the sampling is performed. The ap- 
plication of this algorithm does not require inversion of 
the matrix A- The sequence of directions d is rather 
arbitrary, and could be a random sequence or a prede- 
fined deterministic sequence. Strictly speaking, detailed 
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balance is satisfied only if the directions are randomly 
chosen at each step. Nevertheless it has been shown 
in Ref[l(l that correct sampling can be achieved if ev- 
ery Monte Carlo move leaves the equilibrium distribu- 
tion unchanged. In Appendix |A~1 we show that this is the 
case, provided that direction d is chosen independently 
from position x. Nevertheless, different choices of direc- 
tions can lead to different sampling efficiency. Our final 
choice will be to select for d a sequence of conjugate di- 
rections (Section ll A[) . However, we shall first analyze the 
choice of random, uncorrelated directions, and a sequen- 
tial sweep along a set of orthogonal directions. 

For the sake of simplicity, we take b = and we choose 
the basis into which A is diagonal, Ay = a;(5y. Since 
these properties arc subsequently never used, no loss of 
generality is implied. To compare the efficiency of the 
different choices of directions we shall consider the au- 
tocorrelation matrix for the components along the eigen- 
modes (oci (0) Xj (t)). A quantitative measure of the speed 
of decorrelation of (xt (0) Xj (t)) can be obtained from its 
slope at the origin. Since in Monte Carlo one progresses 
in discrete steps, this quantity is given by 

(Xi (0) XJ (1)) = ^(x i (0) 2 )(x j (0) 2 ) [Sij - Ay (d)] 

(4) 

In Eq. J3J we have introduced the normalized slope ten- 
sor A , which can be expressed as a function of the eigen- 
values of A and of the components of d, using equations 
© and ©: 



, > (Xihilij 

( d ) = ^ T2 



X l (0)' 



Xl (or) ( Xj (o) 



(5) 

Therefore, depending on the choice of direction d, the dif- 
ferent components of the vector x decorrelate at different 
speeds. However, since Tr A = 1, the sum of these nor- 
malized speeds does not depend on the direction chosen. 
The same quantity ^ also enters a recursion relation for 
the autocorrelation functions at a generic Monte Carlo 
step t, 



( Xl {Q) X] {t))-Y^ 



{xi (0) Xj (t+l)) = 
( Xi (0) x k (t)) f^ A kj (d] 



(6) 



Use of this equation requires that one appropriately av- 
erages over the direction d, as we shall discuss in the 
following. 

We will begin our analysis from the simpler case, in 
which the direction d is chosen at every step to be equal 
to a stochastic vector R, whose components are dis- 
tributed as Gaussian random numbers with zero mean 
and standard deviation one. The normalized slope tensor 
([5|) in this case results from an average over the possible 



directions. 



(Ay (d = R)) = a. t Sr 



R 2 



N- 



di Sij 

Tr A 



(7) 



The limit expression holds for the size TV of the matrix 
going to infinity (see Appendix |B|) . under the hypothesis 
that the largest eigenvalue of A does not grow with N 
and that Tr A is O (TV) , hypotheses which are relevant to 
many physical problems. Since in this case the direction 
chosen at every step is independent of all the previous 
choices, the same average enters equation ([6]) at any time, 
so that proceeding by induction one can easily obtain the 
entire autocorrelation function, 

{x i (0)x j (t)) = S ij (x i (0) 3 )[l-{A ij )] t (8) 

where (Ay) is the quantity obtained in equation l|T|). 
From ijHJ) we can calculate the autocorrelation time for 
mode i. 



EZo (xi(0)*i (*)> 



jv->oo Tr A 



a. 



In the case of large N, the decorrelation speed of the 
components along normal modes is directly proportional 
to the corresponding eigenvalue, so that in ill-conditioned 
cases a critical slowing down for the softer normal modes 
will be present. 

Let us now consider moves along a predefined set of 
orthogonal directions {u^'} _ Q N _ 1 - This is done to 
mimic the case in which one performs a sweep along 
Cartesian directions. In our reference frame, where A is 
taken to be diagonal, this would be trivial, hence the 
choice of an arbitrarily oriented set of orthogonal di- 
rections. As in standard local hcatbath, the outcome 
will depend on the orientation of the {u^™)} relative to 
the eigenvectors of A. Averaging over all the possible 
choices of initial direction, we find the slope at t = 0, 



(m) (m) 




Obviously, it is not possible to reduce this result to an 
expression which does not depend on the particular set of 
orthogonal directions. However, the following inequality 
holds 



(X i Sij 



Na r , 



< (A,; 



< 



CLiSij 



Na r 



(10) 



Equation <| 1 Oj) does not put rigid constraints on the value 
of (Ay), but demonstrates that also in this case .Ai is 
diagonal and suggests that in real life the convergence will 
be faster for the higher eigenvalues, and that the spread 
in the relaxation speed for different modes is larger when 
the condition number k — a max /a min is higher. 

In the case where directions {u^™'} are swept sequen- 
tially we have not been able to derive a closed expression 
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for (xi (0) Xj (t)) because of the dependence of d (t) on 
the previous history. If, on the other hand, a random di- 
rection is drawn from {u^" 1 ) } at every step, (xi (0) Xj (t)) 
is given by expression (JSJ) where (Ay) has the value in 
equation JS]). 



A. Moves along conjugate directions 

It is clear from equation (J8]) that a random choice of 
the directions d leads to fast decorrelation of the com- 
ponents relative to the eigenvectors with high eigenval- 
ues. On the other hand, the components relative to the 
eigenvectors with low eigenvalues will decorrelate more 
slowly. Similar behavior is expected for the local heat- 
bath method, unless particular relations hold between 
the eigenvectors and the Cartesian axes. If the operator 
A is ill-conditioned, the practical consequence is that the 
slow modes will be accurately sampled only after a very 
large number of steps. As we have already discussed, the 
sum of the decorrelation slopes of the different compo- 
nents does not depend on the choice of the directions d. 
However, with a proper choice of the directions d this 
sum could be spread in a uniform way among the differ- 
ent modes. A similar problem arises in minimization al- 
gorithms based on directional search, and is often solved 
choosing a sequence of conjugated directions [ll| . In the 
same spirit, we can compute the decorrelation speed of 
the different modes when the d's are chosen to be con- 
jugated directions. Let us consider a set of conjugated 
directions {h^}, such that hW A>W = The set 
{h«} can be generated with various algorithms, such as 
a Gram-Schmidt orthogonalization that uses the positive 
definite A matrix as a metric, or a conjugate gradient 
procedure, as described in Section [TBI 

Using the fact that J2 k h\ k ^h^ = a^ 1 5 i j, the slope at 
t = is 




With this choice, the decorrelation slopes of the differ- 
ent modes are independent of the eigenvalue. If one 
chooses one conjugate direction at random at each step 
it is straightforward to show that overall the autocorre- 
lation function decays exponentially as 



(xi (0)x j (t)} = 6 i j(x i (0) 



1 

N 



This derivation shows that if matrix A is ill- 
conditioned and one wishes to decorrelate the slow 
modes, then the choice of performing the heatbath us- 
ing a sequence of conjugated directions can improve the 
sampling quality dramatically. Of course, the slow modes 



are accelerated and the fast modes are decelerated. How- 
ever, it is clear that a completely independent vector x 
is obtained only when all the modes are decorrelated. A 
heatbath on conjugate directions allows all the modes to 
be decorrelated with the same efficiency, irrespective of 
their stiffness. Even better efficiency can be obtained by 
sequentially sweeping a set of conjugated directions. At 
first sight it would appear that the dependence of h (t) 
on h (t — 1) would make it very difficult if not impossible 
to obtain the autocorrelation function in a closed form. 
However, conjugate directions have a redeeming feature. 
If we expand the position vector on the non-orthogonal 
basis {h^™)}, x = J^a'h^, and we evaluate the cor- 
relation matrix between the contravariant components 
a 1 , we find that (a 1 a 3 ) = <5y. This property can be 
easily demonstrated taking into account that the ensem- 
ble average (xiXj 



= A i ■ , and that conjugacy implies 
Thus, effectively, every time we per- 



form a heatbath move along direction the component 
a? is randomized, without affecting the others. After a 
complete sweep across the set of directions a completely 
independent state is obtained. 

A more formal proof is provided in appendix [C] where 
it is also demonstrated that the autocorrelation function 



is 



( Xi (0)xj (t)) = ( Xi (0) 



2\ J [l-£] t<N 







t > N 



(11) 



Therefore the corresponding autocorrelation time is Tj = 
(N + 1) /2. A remarkable feature of equation (fTTj) is that 
the autocorrelation function is linear, and that after N 
moves a completely independent vector is obtained. This 
property holds also for the global heatbath method. In 
Section HTH we shall discuss the relation between our ap- 
proach and global heatbath sampling. 



B. Conjugate-gradient approach to generate 
conjugate directions 

In the last section we have shown how a heatbath 
algorithm based on conjugate directions can dramati- 
cally improve the sampling of the slow modes for an 
ill-conditioned action. An efficient strategy to generate 
these directions is the application of the conjugate gra- 
dient procedure ll| . For the sake of completeness and to 
introduce a consistent notation we give here an outline 
of the CG algorithm. One starts from a random config- 
uration and search direction, h.(°) = g(°) = R, so that 
the directions obtained and the sample vector x are in- 
dependent as required. Then, a series of directions h^ m ' 
and residuals g( m ) are generated using the recurrence re- 
lations 

g (*+i) = g W - AjA • h« h(* +1 > = g(* +1 ) + 7l • h« 



A, = 



g(») . g(») 
h^AhW 



r(«+l) . gCi+i) 
g-CO . cr(«) 
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FIG. 1: Scheme of the block algorithm described in paragraph 
II Bl squares represent eigenvectors of the action matrix, which 
need to be refreshed in order to obtain a statistically indepen- 
dent sample point; modes on the same column correspond to 
the same, degenerate eigenvalue. At every step, one of the 
vectors of a set with the same size as the biggest degenerate 
subspace is used in a conjugate gradient minimization, while 
the remaining ones are made orthogonal to the search direc- 
tions that are generated in the process. When the first vector 
approaches zero, one can start back on the second one (Fig- 
ure b)), and the process can be continued (Figures c) and d)) 
until the refresh is complete. 
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This procedure generates at every step a new direction 
hw, conjugated to all the previous ones, and it can be 
used to perform a directional heatbath move on x. It 
should be stressed that there is no need to store all the 
if the heatbath moves are performed concurrently 
with the CG minimization. The "force" Ah' 1 ' can be 
reused for performing the heatbath update (cfr. Eq. 
At a certain point the CG procedure will be over, with 
the residual g dropping to zero. The sequential sweep 
algorithm described intc the previous section can be im- 
plemented starting again from the same g' ' . 

In contrast to the global heatbath method, numeri- 
cal stability is not a major issue, since the accuracy of 
the sampling does not depend on the search directions 
being exactly conjugated. The only effect of imperfect 
conjugation would be to slightly reduce the decorrelation 
efficiency There is however a drawback to this approach. 
In order to be ergodic, the set of directions must span the 
whole space. The problem arises when there arc degen- 
erate eigenvalues, as CG converges to zero in a number 
p of iterations equal to the number of distinct eigenval- 
ues. If we keep reusing the same set of p < N directions, 
only a part of the subspaces corresponding to degenerate 
eigenvalues will be explored, and the sampling will not 
be ergodic. 

We have considered two possible ways of recovering er- 
godicity. The simplest consists in drawing a new random 



point g' ' = R every time we reset the CG search. This 
causes a deviation from the linear behavior of the auto- 
correlation functions for t « N. Non-degenerate eigen- 
values will initially converge with — 1/p instead of — 1/N 
slope, but degenerate ones will converge more slowly, and 
with exponential trend, as we are sampling random di- 
rections within every degenerate subspace. 

In order to improve the efficiency, we mix CG with 
Gram-Schmidt orthogonalization of a small set of vec- 
tors, ideally of the same size d of the largest degeneracy 
present. As discussed earlier, here Gram-Schmidt orthog- 
onalization has to be performed using the metric of A , 
which amounts to imposing conjugacy. The procedure is 
illustrated in Figured] We start from d random vectors, 
{v^} d _ v We set h(°) = g' ' = v' ' and begin a 
CG minimization. At each step we obtain a search di- 
rection and make each of the other d — 1 vectors 
conjugate to with a Gram-Schmidt procedure. This 
does not require any matrix-vector product other than 
the one necessary for the heatbath step. After p itera- 
tions the conjugate gradient will have converged and g 
will be close to zero. We can start again from the sec- 
ond vector in the pool, which meanwhile has become v' 1 ' , 
and is conjugate to all the directions visited so far. Thus, 
we set h' ' = g' ' = and start again the CG pro- 
cedure, orthogonalizing the d — 2 remaining vectors to 
hS l \ and so on and so forth. After N steps the proce- 
dure will be converged. At the successive sweep, one can 
generate again a set of random initial {v^'}. This can 
make the method more stable, at the cost of some loss 
in performance. Some savings can be made if one stores 
the conjugated v' 1 ' , and uses them in the subsequent 
sweeps, avoiding the need to repeat the GS orthogonal- 
izations (see figure [IJ. In practice, where more than one 
complete sweep is affordable, it is easy to devise adap- 
tive variations of this scheme, in which the pool of vectors 
{v' J ' } is enlarged whenever the CG minimization con- 
verges in less than N steps, so that in a few sweeps the 
optimal size to guarantee crgodicity is attained. 



II. BENCHMARKS AND COMPARISON WITH 
LOCAL HEATBATH 



In the previous section we have discussed a collec- 
tive modes heatbath method that could outperform stan- 
dard local heatbath techniques when the Hamiltonian 
has a very large condition number and sampling along 
the slower eigenmodes is required. In this section we il- 
lustrate the efficiency of our algorithm using numerical 
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experiments on a simple model for A, 



A = 1 
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(12) 



This matrix corresponds to the dynamical matrix of a 
linear chain of spring-connected masses, with periodic 
boundary conditions and an additional diagonal term to 
make the acoustic mode nonzero, b can be chosen so as 
to obtain the desired condition number. Eigcnmodes and 
eigenvalues for such a matrix are easily obtained, 



1+26 1 



2fc7T 



,(*0 



?0k 



->N/2,k 



N 



cos 2^ k < N/2 
sin k > N/2 



and projection of a state on the eigenvectors is quickly 
done via fast-Fourier transform. In Figure [2] we com- 
pare the the autocorrelation functions obtained with dif- 
ferent algorithms for a matrix of the form (|T2|) . Fig- 
ure [5] also highlights the ergodicity problems connected 
with the naive use of the conjugate gradient algorithm 
to generate the search directions, and shows how both 
the suggestions of paragraph II Bl can help in solving this 
problem. In general, a conjugate directions search speeds 
up decorrelation for the slower modes, but is less efficient 
than local hcatbath for the modes with a high eigenvalue. 
This is a direct consequence of the fact that Tr A = 1. 
An additional advantage of our method is the linear rate 
of decorrelation, which allows complete decorrelation just 
like the direct inversion of M , whereas moves along the 
Cartesian axes lead to approximatively exponential au- 
tocorrelation functions. 

We stress again that the relative efficiency of the two 
methods depends strongly on the observable being calcu- 
lated and on the actual spectrum of the Hamiltonian of 
the system. As a more realistic benchmark we will con- 
sider the evaluation of the trace of the inverse matrix, 



= Tr(A" 1 ) = (x 2 ) 



(13) 



This observable is strongly dependent on the slow modes. 

In Figure [3] we plot the ratios of the autocorrelation 
times r [Q] as obtained with local heatbath moves and 
with the block conjugate gradient version of our algo- 
rithm, as a function of changing condition number and 
system size. 



FIG. 2: Autocorrelation functions for a) the projection along 
the mode ao = 1; b) the projection along the mode cu ~ 9.8 
for a matrix of the form (|12[) with N — 100 and condition 
number k — 10 3 . Line A corresponds to local heatbath moves 
(one step stands for a complete sweep of the N coordinates) , 
lines B to D to conjugate directions moves: B is the hybrid 
conjugate gradient/Gram-Schmidt block algorithm; C corre- 
sponds to CG sweeps, with the search direction randomized 
at the beginning of every sweep; curve D corresponds to CG 
sweeps starting from the same initial vector. Conjugate di- 
rection moves decorrelate faster than local heatbath for the 
slow mode, but are less efficient for modes with higher eigen- 
value. For degenerate eigenmodes, the method used for curve 
D is not ergodic (and thus gives incorrect values for (xf)), 
and random restarts (curve C) are much less efficient than 
the hybrid (curve B) algorithm. 
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III. COMPARISON WITH GLOBAL HEATBATH 

It remains for us to discuss how our method fares 
in comparison with global heatbath. The latter re- 
quires that matrix A be decomposable in the form 
A = : M T M- This is the case in many fields[H, but 
in principle if it were necessary to decompose A this 
would add extra cost. Here we make our comparison as- 
suming that ^ is already available. In such a case, the 
two algorithms are on paper equally efficient in producing 
statistically independent samples. The global heatbath 
might offer some numerical advantages when the spec- 
trum of J^i is highly degenerate, since the number of 
CG iterations needed to solve the .Ma; = R linear sys- 
tem is p < N, as discussed earlier. Whenever a good 
preconditioner for the linear system is available, other 



6 



FIG. 3: (Color online) Comparison ol the efficiency ol local 
heatbath versus conjugate-gradient moves. The graph rep- 
resents tcg/tioc, the ratio of the autocorrelation times for 
the observable Q (|13[) ; r; oc corresponds to the value obtained 
from standard local heatbath moves (one unit of Monte Carlo 
time corresponds to a whole coordinates sweep), while tcg 
corresponds to the value obtained with moves along conju- 
gate directions, as obtained from our block algorithm with 
random restarts. The data plotted results from a linear in- 
terpolation of some simulations (labeled by ®) performed for 
an action of the form (|12|) , with varying size N and condition 
number k. 
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inversion algorithms such as the stabilized bi-conjugate 
gradient fl2| or the generalized conjugate residual may al- 
low to solve the linear system with a sufficient accuracy 
more efficiently than using CG. In this paper we make 
the comparison with conjugate gradient because of the 
close analogy with our scheme and because our method 
is aimed at problems where ill-conditioning cannot be 
otherwise relieved. 

In this respect, our method displays significant advan- 
tages. Firstly, it is more stable, because every move 
preserves the probability distribution, and the conjugate 
gradient procedure (which is known to be quite delicate 
in problems with large condition number) is only used 
to generate search directions. Instabilities in the proce- 
dure, which would cause incorrect sampling in the global 
heatbath, affect only the efficiency, and not the accuracy. 
Moreover, dividing the TV steps of an iterative inversion 
process into separate heatbath moves greatly improves 
the flexibility of the sampling scheme. To give some ex- 
amples, if one needs to perform an average on a slowly 
varying A , it is possible to perform only a partial sweep 
with fixed action, then continue with the new A, as- 
suming that eigenmodes will change slowly. It is also 
straightforward to tailor the choice of directions in order 
to optimize the convergence speed for the observable or 



TABLE I: Percentual errors in the evaluation of Q = (x 2 ) 
(equation (|13[) L extimated using a blocking analysis, for dif- 
ferent sampling methods. A corresponds to local heatbath, B 
corresponds to "hybrid" versions of our CG algorithm, with 
a pool of two vectors with random restarts, while curve C 
is obtained including the tricks described in section IIHI with 
m = 50. Different tests are performed with varying matrix 
size N, number of sampling steps T and condition number 
k. Due to the large autocorrelation time, the values of the 
error for local heatbath with N = 100 and T = 10 6 could not 
be extimated as reliably as in the other cases, and are only 
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interest. Adlcr's ovcrrelaxation [ 1 3j can be included nat- 
urally, and can help in further optimizing the autocorre- 
lation time. As an example of possible fine-tunings, let 
us recall the observable Q introduced in the previous sec- 
tion (equation (fl~3|l ). This observable depends strongly 
on the softer eigenvector of A- We have then modified 
our algorithm in the following way: we perform block 
conjugate gradient sweeps, with random resets, and we 
monitor the curvature along the direction being thcrmal- 
ized, hAh/h • h. We save the direction of minimum 
curvature encountered along the sweep, h m i„; during the 
following sweep, every m moves along the CG directions, 
one move is performed along h. m - m . As is evident from 
Figured! this trick considerably reduces the autocorrela- 
tion time for Q. Even smarter combinations of moves can 
be devised, and the one we suggest is just an example of 
how the additional flexibility gained through subdividing 
the inversion process in N exact sampling moves can be 
exploited. In Tablc|T]wc report some numerical extimatcs 
of the error in the evaluation or CI, which can serve as a 
reference to compare our method to other approaches. 



IV. CONCLUSIONS 

We have presented an algorithm for performing col- 
lective modes heatbath along conjugate directions for a 
quadratic action, which allows the components of the 
sampling vector along all modes to be decorrelatcd in N 
steps, with a linear decay to zero. This method is more 
computationally demanding than local updates, but be- 
comes competitive for ill-conditioned actions, when one 
needs to compute observables which depend on modes 
with low eigenvalues, or when the spectrum of the ac- 
tion matrix has only a few high eigenvalue modes which 
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FIG. 4: Autocorrelation function for the observable (|13[) for 
an action of the form (|12[) . with size N = 100 and condition 
number k = 5x 10 3 . Line A corresponds to local heatbath, 
line B to the "hybrid" versions of our CG algorithm, with 
a pool of two vectors with random restarts, while curve C 
is obtained including the tricks described in section Hill with 
m — 5 




200 
t [steps] 



400 



would slow down Cartesian moves. In fact, this method 
has an efficiency comparable with that of direct inver- 
sion of the matrix, but presents various advantages, such 
as improved stability, as the numerical issues connected 
with conjugate gradient method do not affect the accu- 
racy of the sampling, and the possibility of exploiting 
some additional flexibility to improve the sampling on a 
case-by-case basis. Lastly, global heatbath requires the 
knowledge of the square root of the action A, so our 
scheme should be considered whenever the square root is 
difficult to compute or its use is inefficient with respect 
to the original action. 

The geometrical simplicity of this approach, with its 
close analogy with minimization methods, also suggests 
that it might be extended to the sampling of anharmonic 
systems. 



APPENDIX A: 

We report here a simple demonstration of the fact 
that heatbath moves along a generic direction d leave 
an equilibrium probability distribution unchanged. We 
will use the fact that if R, R' and R" are vectors dis- 
tributed as Gaussians with zero mean and standard de- 
viation one, then B_R + C.R' is distributed as J2R" 
where T) 7 J2 B M3 + C~ 7 C . Since x is drawn from 
the equilibrium distribution, i.e. x = ^£ _1 R, we can 
cast Eq. ((3]) and ((2|) into the form 



p im = (Mr 1 ] 



J"> 



•Ej — ^ ] Fj m R rn ~t~ ^ ] Q jm-^rn 
m m 

dj Mkmdk Qjm = djSmO 



where we have put b = into Eq. ~~ and normalized 
the direction so that d Ad = 1 in order to simplify the 



notation. We can then compute 

^ FjmPlm = ^^jl djd[ ^ ] Q jmQhn = djdl 



that P T P + Q T Q = (M- 



M 



i.e. also x' 



may be written as M. X R, and is therefore correctly dis- 
tributed. 



APPENDIX B: 

We shall here discuss briefly the derivation of the 
asymptotic form of equation ~~ when the size N of the 
action matrix tends to infinity. The quantity to be com- 
puted is 



dx- 



■ exp 



The integral can be transformed as follows: 



In 



Qi (x J dt j dxx 2 exp 



J cut + 1 11 y/a k t + 1 ' 



and the resulting expression, including the correct nor- 
malization, is 



- / dt 

2 J ait + 1 



/(*), /(*)=n 



i 



\/a k t + 1 



(Bl) 

Let us focus on F = J Q f (t) dt, since all the Qi can be 
computed as Qi = a%§^ + \F. We perform the change 
of variables Nt — > t, so that 



f(t)dt 



1 

N 



/ f(t)dt, f(t)=\ 
Jo 



Under the physically reasonable assumption that Tr .A_ = 
O (N), and that the maximum eigenvalue does not scale 
with the system size, we can use 1/N as a small param- 
eter. Expanding log / one finds 



io g /(t) = 5>g(i+^t) = 



N 



^ n+ 1 V L/vJ ^N2 



N T 



All but the leading term become negligible for N — > oo. 
This suggests separating out from / (t) the term order 
zero in 1/N, and writing for F the expression 



N 



exp 



iT 



t 2 + o 



1 



t Tr A 

2 N 

t 3 + . . 



dt (B2) 
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which leads to the asymptotic result F = T + 
0(N~ 2 ^. Correspondingly, dropping the higher order 
terms in 1/JV, we have = + O (iV -2 ), which is 

the desired result. 



APPENDIX C: 

We obtain here the autocorrelation function for the 
components along the eigenmodes of the action matrix 
A , when performing heatbath sweeps along a set of 
conjugate directions {h ( - m - ) } TO _ N _ v In this section, 
the indices of the directions are defined modulo N, i.e. 
hy +N ' = hy>. In this case, one can write Eq. © as 



(xi(0)xi(t + l)) = ( Xi (O)xi(t)) 



m k 



(CI) 



Explicit calculations for small values of t suggest for t < 
N the ansatz 



(Xi (0)xi(t)) = (xi (0) 



1-1 

N 



(C2) 



Since the first term in Eq. (|C1|) does not contain the new 
direction, we can substitute the ansatz without concern. 
On the other hand, the second term contains reference 
to h( m \ so that the average that led to (|C2[) cannot be 
performed separately, and one should rather write: 



m k 



£>,(()) a* (t-1)) 



( 6 k , k - x [^A k , k (h^)) MA ki (h<*»> 

\ V a k V ' J V a i V 



(C3) 



which is split into 



N H 



rn k L 



(0)x fc J^A w (h( r 



(C4) 



l - ^ [(^ (0)^ (t - 1)) ^A fe , fe (h^™- 1 )) A fcl (hW) 

(C5) 



The term (|C5|) goes to zero, since 



fc m 



while (|C4[) can be expanded again, giving rise 
to the t — 2 analogue and to a term containing 
A k , k (h( m - 2 )) A fej (h(™)). One iterates this process re- 



cursively until it reaches \ Xi (0) ), thus contributing an- 
other — 1/N to the autocorrelation function. Things arc 
different for t > N, since terms involving products of the 
slopes for the same direction will enter the procedure at 
a certain point in the iteration. Because of these terms, 
for t > N autocorrelation functions will be identically 
zero. 
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